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(54) Hybrid multi-channel/cue coding/decoding of audio signals 



(57) Part of the spectrum of two or more input sig- 
nals is encoded using conventional coding techniques, 
while encoding the rest of the spectrum using binaural 
cue coding (BCC). In BCC coding, spectral components 
of the input signals are downmixed and BCC parame- 
ters (e.g., inter-channel level and/or time differences) 
are generated. In a stereo implementation, after con- 
verting the left and right channels to the frequency do- 
main, pairs of left- and right-channel spectral compo- 
nents are downmixed to mono. The mono components 



are then converted back to the time domain, along with 
those left- and right-channel spectral components that 
were not downmixed, to form hybrid stereo signals, 
which can then be encoded using conventional coding 
techniques. For playback, the encoded bitstream is de- 
coded using conventional decoding techniques. BCC 
synthesis techniques may then apply the BCC parame- 
ters to synthesize an auditory scene based on the mono 
components as well as the unmixed stereo components. 
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Description 

[0001] The present invention relates to the encoding 
of audio signals and the subsequent decoding of the en- 
coded audio signals to generate auditory scene during 
playback. 

[0002] In conventional stereo audio coding, the sum 
and the difference of the left and right audio channels of 
the stereo input signal are formed and then individually 
coded, e.g., using adaptive differential pulse code mod- 
ulation (ADPCM) or some other suitable audio coding 
algorithm, to form an encoded audio bitstream. The cor- 
responding conventional stereo audio decoding in- 
volves reversing the (ADPCM) coding algorithm to re- 
cover decoded sum and difference signals, from which 
left and right audio channels of a decoded stereo output 
signal are generated. 

Although such conventional stereo audio coding/decod- 
ing (codec) techniques can produce an auditory scene 
during playback that accurately reflects the f idelity of the 
stereo input signal, the amount of data required for the 
corresponding encoded audio bitstream may be prohib- 
itively large for some applications involving limited stor- 
age space and/or transmission bandwidth. 
[0003] US Patent Serial number 09/848,877, 
1 0/045,458 and 1 0/155,437 describe audio codec tech- 
niques that can produce smaller encoded audio bit- 
streams for the same or substantially similar levels of 
playback fidelity as those associated with conventional 
stereo audio codecs. In particular, these patent applica- 
tions are related to an audio coding technique referred 
to as binaural cue coding (BCC). 
[0004] When BCC coding is applied to stereo audio, 
the left and right channels of the stereo input signal are 
downmixed (e.g., by summing) to a single mono signal, 
which is then encoded using a suitable conventional au- 
dio coding algorithm such as ADPCM. In addition, the 
left and right channels are analyzed to generate a 
stream of BCC parameters. In one implementation, for 
each audio frame (e.g., 20 msec), the BCC parameters 
include auditory spatial cues such as an inter-channel 
or inter-aural level difference (ILD) value and an inter- 
channel or inter-aural time difference (ITD) value be- 
tween the left and right channels for each of a plurality 
of different frequency bands in the stereo input signal. 
Since the corresponding encoded audio data might in- 
clude only an encoded mono signal and a stream of 
BCC parameters, the amount of encoded data may be 
considerably smaller (e.g., 50-80%) than that for a cor- 
responding encoded audio bitstream generated using 
conventional stereo audio coding, such as that de- 
scribed previously. 

[0005] The corresponding BCC decoding involves re- 
versing the (e.g., ADPCM) coding algorithm to recover 
a decoded mono signal. Stereo audio synthesis tech- 
niques are then applied to the decoded mono signal us- 
ing the BCC parameters to generate left and right chan- 
nels of a decoded stereo audio signal for playback. Al- 



though typically lower than that achieved using conven- 
tional stereo audio codecs, the fidelity of an auditory 
scene generated using BCC coding and decoding may 
be acceptable for many applications, while typically us- 

5 ing lower bandwidth. 

[0006] Embodiments of the present invention are re- 
lated to a hybrid audio codec technique in which con- 
ventional audio coding is applied to certain frequency 
bands of the input audio signals, while BCC coding is 

10 applied to other frequency bands of the input audio sig- 
nals. In one possible stereo implementation, signal 
spectral components whose frequencies above a spec- 
ified threshold frequency (e.g., 1 .5 kHz) are coded using 
BCC coding, while lower-frequency components are 

15 coded using conventional stereo coding. As a result, 
even higherfidelity playback can be achieved than using 
only BCC coding, while still reducing the total amount of 
encoded data compared to conventional stereo coding. 
[0007] According to one embodiment, the present in- 

20 vention is a method for encoding N input audio signals, 
where A/>1 . Each of the A/input audio signals is convert- 
ed into a plurality of spectral components in a frequency 
domain. For each of one or more, but not all, of the spec- 
tral components, the spectral components correspond- 

25 ing to the N input audio signals are downmixed to gen- 
erate a downmixed spectral component, leaving one or 
more of the spectral components for each of the N input 
audio signals unmixed. An encoded audio bitstream is 
generated based on the one or more downmixed spec- 

30 tral components and one or more unmixed spectral 
components. 

[0008] Preferably the one or more auditory spatial pa- 
rameters include one or more of an inter-channel level 
difference and an inter-channel time difference. 

35 [0009] Preferably the one or more downmixed spec- 
tral components have frequencies above a specified 
threshold frequency, and the one or more unmixed 
spectral components have frequencies below the spec- 
ified threshold frequency. 

40 [0010] Preferably the specified threshold frequency 
varies dynamically overtime. 

[0011] Preferably the specified threshold frequency 
varies as a function of bit rate. 

[0012] Preferably the one or more down mixed spee- 
ds tral components have spectral energies below a speci- 
fied threshold energy, and the one or more unmixed 
spectral components have spectral energies above the 
specified threshold energy. 

[0013] According to another embodiment, the present 
so invention is an encoded audio bitstream generated by 
performing the previously recited method. 
[0014] According to another embodiment, the present 
invention is an apparatus for processing N input audio 
signals, where N>1, for encoding. One or more trans- 
55 form are configured to convert each of the N input audio 
signals into a plurality of spectral components in a fre- 
quency domain. A downmixer is configured, for each of 
one or more, but not all, of the spectral components, to 



2 



BNSDOCID: <EP 1376538A1J_> 



3 



EP 1 376 538 A1 



4 



downmixthe spectral components corresponding to the 
N input audio signals to generate a downmixed spectral 
component, leaving one or more of the spectral compo- 
nents for each of the N input audio signals unmixed. 
[0015] Preferably, the apparatus further comprises 
one or more inverse transforms configured to convert 
the one or more downmixed spectral components and 
the one or more unmixed spectral components into N 
hybrid audio signals. 

[0016] Preferably, the apparatus further comprises an 
audio coder configured to generate an encoded audio 
bitstream based on the one or more downmixed spectral 
components and the one or more unmixed spectral 
components. 

[0017] Preferably, the apparatus further comprises a 
generator configured to generate one or more auditory 
spatial parameters for the one or more downmixed 
" spectral components. 
[0018] Preferably, N=2, the two input audio signals 
correspond to left and right input audio signals of a ster- 
eo input audio signal, each downmixed spectral compo- 
i nent is a mono spectral component, and a stereo audio 
* coder can generate an encoded audio bitstream based 
on the one or more downmixed spectral components 
and the one or more unmixed spectral components. 
[0019] Preferably, the one or more downmixed spec- 
tral components have frequencies above a specified 
threshold frequency, and the one or more unmixed 
spectral components have frequencies below the spec- 
ified threshold frequency. 

[0020] According to another embodiment, the present 
invention is a method for decoding an encoded audio 
bitstream. The encoded audio bitstream is decoded to 
generate a plurality of spectral components in a frequen- 
cy domain, wherein one or more sets of the spectral 
components correspond to downmixed spectral compo- 
nents, and one or more sets of the spectral components 
correspond to unmixed spectral components. For each 
set of the downmixed spectral components, one or more 
auditory spatial parameters are applied to generate a 
synthesixed spectral component. The synthesized 
spectral components and the unmixed spectral compo- 
nents are converted into IN decoded audio signals in a 
time domain, where N>1 . 

[0021] Preferably the one or more down mixed spec- 
tral components have frequencies above a specified 
threshold frequency and the one or more unmixed 
spectral components have frequencies below the spec- 
ified threshold frequency. 

[0022] According to another embodiment, the present 
invention is an apparatus for decoding an encoded au- 
dio bitstream. An audio decoder is configured to decode 
the encoded audio bitstream to generate a plurality of 
spectral components in a frequency domain, wherein 
one or more sets of the spectral components corre- 
spond to downmixed spectral components, and one or 
more sets of the spectral components correspond to un- 
mixed spectral components. A synthesizer is config- 



ured, for each set of the downmixed spectral compo- 
nents, to apply one or more auditory spatial parameters 
to generate a synthesized spectral component. One or 
more inverse transforms are configured to convert the 
5 synthesized spectral components and the unmixed 
spectral components into N decoded audio signals in a 
time domain, where N>1 . 

[0023] Preferably the audio decoder is configured to 
decode the encoded audio bitstream to generate N hy- 
10 brid audio signals and further comprising one or more 
transforms configured to convert each of the N hybrid 
audio signals into the plurality of spectral components 
in the frequency domain. 

[0024] Preferably N=2, the encoded audio bitstream 
15 is decoded using a stereo audio decoder, the two hybrid 
audio signals correspond to left and right hybrid audio 
signals of a hybrid stereo audio signal : and each down- 
mixed spectral component is a mono spectral compo- 
nent. 

20 [0025] Preferably the one or more downmixed spec- 
tral components have frequencies above a specified 
threshold frequency and the one or more unmixed spec- 
tral components have frequencies below the specified 
threshold frequency. 

25 [0026] Other aspects, features, and advantages of 
the present invention will become more fully apparent 
from the following detailed description, the appended 
claims, and the accompanying drawings in which: 

30 Fig 1 shows a block diagram of a hybrid audio sys- 
tem, according to one embodiment of the present 
invention: 

Fig 2 shows a block diagram of the processing im- 
plemented by the BCC analyser/mixer of Fig. 1 , ac- 
35 cording to one embodiment of the present inven- 
tion; and 

Fig 3 shows a block diagram of the processing im- 
plemented by the BCC synthesizer of Fig 1 , accord- 
ing to one embodiment of the present invention. 

40 

[0027] Fig. 1 shows a block diagram of a hybrid audio 
system 100, according to one embodiment of the 
present invention. Audio system 100 comprises trans- 
mitter 102 and receiver 104. Transmitter 102 receives 

45 the left (L) and right (R) channels of an input stereo audio 
signal and generates an encoded audio bitstream 106 
and a corresponding stream 108 of BCC parameters, 
which, depending on the implementation, may or may 
not be explicitly encoded into bitstream 106. Fig. 1 

50 shows BCC parameter stream 108 being transmitted 
out-of-band from transmitter 102 to receiver 104. In ei- 
ther case, receiver 104 receives the data generated by 
transmitter 102, decodes encoded audio bitstream 106, 
and applies the BCC parameters in stream 108 to gen- 

55 erate the left (!_') and right (FT) channels of a decoded 
stereo audio signal. 

[0028] More particularly, transmitter 102 comprises 
BCC analyzer/mixer 110 and stereo audio coder 112, 
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while receiver 1 04 comprises stereo audio decoder 114 
and BCC synthesizer 116. 

[0029] In transmitter 102, BCC analyzer/mixer 110 
converts the left (L) and right (R) audio signals into the 
frequency domain. For spectral components above a 
specified threshold frequency, BCC analyzer/mixer 110 
generates BCC parameters for stream 108 using the 
BCC techniques described in the '877, '458 : and '437 
applications. BCC analyzer/mixer 110 also downmixes 
those high-frequency components to mono. Copies of 
the high-frequency mono component are then convert- 
ed back to the time domain in combination with the tow- 
frequency "unmixed" left and right components (i.e., the 
unprocessed frequency-domain components below the 
specified threshold frequency), respectively, to form hy- 
brid left and right signals 118. Stereo audio coder 112 
applies conventional stereo coding to these hybrid left 
and right signals to generate encoded audio bitstream 
106. 

[0030] Fig. 2 shows a block diagram of the processing 
implemented by BCC analyzer/mixer 110 of Fig. 1 , ac- 
cording to one embodiment of the present invention. 
Fast Fourier transform (FFT) 202 L converts the left au- 
dio signal L into a plurality of left-channel spectral com- 
ponents 204 in the frequency domain. Similarly, FFT 
202 R converts the right audio signal R into a plurality of 
right-channel spectral components 206 in the frequency 
domain. The one or more left-channel components 
204 HI and the corresponding one or more right-channel 
components 206 H , whose frequencies are above the 
specified threshold frequency are applied to both down- 
mixer 208 and BCC parameter generator 216. 
[0031 ] Downmixer 208 combines each high-frequen- 
cy left-channel component 204 H , with its corresponding 
high-frequency right-channel component 206 H , to form 
a high-frequency mono component 21 0 HI . The process- 
ing performed by downmixer 208 to generate the mono 
components may vary from implementation to imple- 
mentation. In one possible implementation, downmixer 
208 simply averages the corresponding left-and right- 
channel components. In another possible implementa- 
tion, downmixer 208 implements the downmixing tech- 
nique described in the "xxx application. Those skilled in 
the art will appreciate that other suitable downmixing al- 
gorithms are possible. 

[0032] Replicator 212 generates two copies of each 
high-frequency mono component 21 0 H , for application 
to left and right inverse FFTs (IFFTs) 214 L and 214 R , 
respectively. IFFTs 21 4 L and214 R also receive the low- 
frequency left and right components 204 LO and 206 LO , 
respectively, from FFTs202 L and202 R . IFFTs 214 L and 
21 4 R convert their respective sets of components back 
to the time domain to generate the left and right hybrid 
signals 118 L and 118 R , respectively. The resulting two- 
channel signal contains identical frequency compo- 
nents within spectral regions that were converted to mo- 
no, with the remaining parts being identical to the input 
signals L and R. As a result, stereo audio coder 1 12 will 



typically generate an encoded audio bitstream that has 
fewer bits than if it were to encode the original input ster- 
eo audio signal (L and R). 

[0033] BCC parameter generator 216 analyzes the 
5 high-frequency left and right components 204 H , and 
206 H , to generate BCC parameters for stream 108 of 
Fig. 1 for each frequency band above the specified 
threshold frequency. 

[0034] Referring again to Fig. 1 , in receiver 104, ster- 

10 eo audio decoder 114 applies a conventional stereo de- 
coding algorithm (e.g. , to reversethe coding implement- 
ed by coder 1 1 2) to recover hybrid decoded left and right 
signals 1 20. BCC synthesizer 116 applies BCC synthe- 
sis techniques to the high-frequency portions of chan- 

15 nels 120 to synthesize the high-frequency portions of 
the decoded left (L 1 ) and right (R 1 ) channels. In particular, 
BCC synthesizer 116 converts the hybrid channels 120 
to the frequency domain, applies the BCC parameters 
to the high-frequency components to synthesize high- 

20 frequency left and right components using the BCC 
techniques described in the '877, '458, and '437 appli- 
cations, and then reconverts the resulting synthesized 
high-frequency components and corresponding decod- 
ed low-frequency components to the time domain. 

25 [0035] Fig. 3 shows a block diagram of the processing 
implemented by BCC synthesizer 1 16 of Fig. 1 , accord- 
ing to one embodiment of the present invention. FFT 
302 L converts hybrid left audio signal 120 L from stereo 
audio decoder 1 14 into a plurality of left-channel spectral 

30 components 304 in the frequency domain. Similarly, 
FFT 302 R converts hybrid right audio signal 120 R from 
decoder 114 into a plurality of right-channel spectral 
components 306 in the frequency domain. The one or 
more left-channel components 304 H , and the corre- 

35 sponding one or more right-channel components 306 Hl 
whose frequencies are above the specified threshold 
frequency are applied to mono signal generator 308. 
[0036] Mono signal generator 308 generates a high- 
frequency mono component for each high-frequency 

40 left-channel component 304 H , and its corresponding 
high-frequency right-channel component 306 H! . Ideally, 
since replicator 21 2 of Fig. 2 generated identical copies 
of each high-frequency mono component 210 HI , each 
high-frequency left-channel component 304 H , should be 

45 identical to its corresponding high-frequency right-chan- 
nel component 306 HI . As such, mono signal generator 
308 could simply selecteitherthe left channel orthe right 
channel to "generate" the one or more high-frequency 
mono components 310 H) . Alternatively, mono signal 

50 generator 308 could simply average or perform some 
other suitable downmixing algorithm, including the algo- 
rithm described in the 'xxx application, to generate each 
mono component 310 HI , in orderto account for any.real- 
world differences that may exist between the left and 

55 right high-frequency component 304 H) and 306 Hl . 

[0037] In any case, BCC stereo synthesizer 312 ap- 
plies BCC processing to generate a high-frequency left- 
channel component 314 H , and a high-frequency right- 
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channel component 31 6 Hl for each high-frequency mo- 
no component 31 0 H) . The high-frequency left- and right- 
channel components 314 H , and 316 H , are applied to left 
and rightlFFTs318 L and318 R , respectively. IFFTs214 L 
and 21 4 R also receive the low-frequency left and right 
components 304 LO and 306 LO , respectively, from FFTs 
302 L and 302 R . I FFTS 31 8 L and 31 8 R convert their re- 
spective sets of components back to the time domain to 
generate left and right channels L' and R\ respectively, 
of the decoded stereo signal of Fig. 1 . 
[0038] A natural cross-over frequency from the "true" 
stereo part to the BCC-generated stereo part is 1 .5 kHz. 
Above that frequency the human auditory system does 
not substantially evaluate inter-aural phase differences 
for sound localization. Thus, the human auditory system 
is less sensitive to inter-channel phase errors intro- 
duced by BCC processing in that range. Moreover, the 
most salient auditory localization cues are usually de- 
rived from low^frequency components, unless the audio 
signal has dominant spectral energy at higher frequen- 
cies. 

[0039] The present invention can also be implement- 
ed using a hybrid transmitter such as transmitter 102 of 
Fig. 1 , but a receiver that does not perform any BCC 
processing. In this case, BCC synthesizer 116 of Fig. 1 
may be omitted from receiver 104, and the resulting re- 
ceiver can ignore BCC parameter stream 108 during de- 
coding processing. Legacy receivers that contain only 
a conventional audio decoder fall into that category. 
Such a receiver would not provide BCC spatialization of 
the auditory image for spectral parts of the decoded au- 
dio signals that are based on mono components. How- 
ever, there is still a remaining stereo effect created by 
those parts of the spectrum that are preserved as ster- 
eo. This stereo effect by itself provides a mechanism for 
bit-rate reduction as compared to the transmission of the 
full-bandwidth stereo. Explicitly, mixing parts of the 
spectrum of the audio input signal to mono reduces the 
bit rate of a conventional audio coder. The spatial image 
degradation should be tolerable, if the mono part of the 
spectrum is limited to frequencies above about 1 kHz. 
[0040] For some applications, BCC processing may 
be intentionally limited to transmit only inter-channel lev- 
el differences as the BCC parameters (i.e., and not any 
inter-channel time differences). For headphone play- 
back, inter-channel time differences are important for 
creating a natural spatial image, especially at frequen- 
cies below 1 ,5 kHz. By keeping the stereo signal up to 
a limit of about 1 .5 kHz, the spatial cues in that frequency 
are available at the receiver and greatly improve the lis- 
tening experience with headphones. 
[0041] Transmitting a small spectral bandwidth as a 
stereo signal does not necessarily increase the bit rate 
of the audio coder dramatically compared to applying 
BCC processing to the full spectral range. The audio 
coder can still take full advantage of those parts of the 
spectrum that are mono by using, e.g., sum/difference 
coding. The data rate for the BCC parameters can be 



reduced, since no parameters need to be transmitted 
for the spectral part that is kept stereo. 
[0042] The application of BCC processing to spectral 
regions can be made adaptive such that an optimum 

5 quality/bit-rate tradeoff is achieved. For instance, BCC 
processing could be switched off for very critical mate- 
rial, or it could be applied to the full spectrum for non- 
critical material The spectral region where BCC 
processing is applied can be controlled, for instance, by 

10 one parameter per frame that indicates the upper fre- 
quency bound up to which the stereo signal is kept for 
encoding. In addition, the threshold frequency between 
stereo and BCC coding could dynamically change 
based on the number of bits that would actually be used 

15 to code different spectral regions of the audio data by 
the different techniques. 

[0043] The audio quality range covered by the hybrid 
codec scheme in Fig. 1 reaches transparent quality 
when the spectral region of BCC processing has zero 

20 bandwidth. With continuously increasing bandwidth for 
BCC processing, a gradual quality transition from tradi- 
tional stereo audio coding to the original full-bandwidth 
BCC coding scheme of the '877, '458, and '437 applica- 
tions is possible. Therefore, the quality range of the 

25 present invention extends to both quality ranges: that of 
the original BCC scheme and that of the traditional audio 
coding scheme. 

[0044] Moreover, the hybrid coding scheme is inher- 
ently bit-rate scalable. In terms of the coder structure, 

30 such a scheme is also referred to as "layered coding." 
This feature can be used for instance to reduce the bit 
rate of a given bitstream to accommodate for channels 
with lower capacity. For such purposes, the BCC param- 
eters can be removed from the bitstream. In that case, 

35 a receiver is still able to decode an audio signal with a 
reduced stereo image, as described above for the leg- 
acy decoder. A further step for reducing the bit rate is 
meaningful, if the stereo audio coder uses sum/differ- 
ence coding. It is possible to isolate the difference signal 

40 information in the bitstream and remove it. In this case, 
the receiver will decode only the sum signal, which is a 
monophonic audio signal. 

[0045] The different "layers" (e.g., sum, difference, 
and BCC information) also provide a natural division of 

45 the bitstream for unequal error protection for lossy chan- 
nels. For such applications, the sum signal would get 
the highest protection and the BCC information would 
get the lowest protection. If the channel temporarily has 
a high error rate, then the mono sum signal might still 

so be recoverable, while the difference signal and BCC in- 
formation might be lost. Such a scheme avoids more 
audibly annoying frame concealment mechanisms. 
[0046] Although the present invention has been de- 
scribed in the context of applications in which BCC 

55 processing is applied to all and only frequency bands 
above a specified threshold frequency, the present in- 
vention is not so limited. In general, for the hybrid 
processing of the present invention, BCC processing 
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can be applied to any one or more - but less than all - 
frequency bands, whether they are contiguous or not, 
and independent of any threshold frequency. 
[0047] For example, in one possible implementation. 
BCC processing is applied to only those frequency 
bands with energy levels below a specified threshold en- 
ergy, while conventional stereo encoding is applied to 
the remaining frequency bands. In this way, convention- 
al stereo encoding optimizes fidelity for the "important' 
(i.e., high spectral energy) frequency bands, while BCC 
processing optimizes bandwidth for the less-important 
(i.e., low spectral energy) frequency bands. 
[0048] Although the present invention has been de- 
scribed in the context of encoding and decoding a stereo 
audio signal, the present invention can also be applied 
to multi-channel applications having more than two input 
and output channels. Furthermore, the present inven- 
tion can be applied to applications in which the number 
of input channels differs from (either higher or lower 
than) the number of output channels. 
[0049] Although the present invention has been de- 
scribed in the context of receivers that apply the BCC 
techniques of the '877, '458, and '437 applications to 
synthesize auditory scenes, the present invention can 
also be implemented in the context of receivers that ap- 
ply other techniques for synthesizing auditory scenes 
that do not necessarily rely on the techniques of the 
'877, "458, and '437 applications. 
[0050] Although the present invention has been de- 
scribed in the context of a real-time system in which the 
generated data are transmitted immediately from the 
transmitter to the receiver for real-time decoding and 
playback, the invention is not so limited. For example, 
the data generated by the transmitter may be stored in 
computer memory or other electronic storage medium 
for subsequent, non-real-time playback by one or more 
receivers. 

[0051] Although the present invention has been de- 
scribed in the context of embodiments having an audio 
coder (e.g., stereo coder 112 of Fig. 1) that encodes hy- 
brid signals in the time domain to generate an encoded 
audio bitstream and an audio decoder (e.g., stereo de- 
coder 114) that decodes the encoded audio bitstream 
to recover decoded hybrid signals in the time domain, 
the present invention is not so limited. Those skilled in 
the art will understand that the present invention can be 
implemented in the context of embodiments that code 
and decode audio data in the frequency domain . For ex- 
ample, the embodiment of Figs. 1-3 can be modified to 
replace stereo audio coder 1 12 and stereo audio decod- 
er 114 with audio codecs that encode and decode, re- 
spectively, audio data in the frequency domain. In that 
case, BCC analyzer/mixer 1 1 0 of Fig. 2 can be modified 
to eliminate replicator 21 2 and IFFTs214, and BCC syn- 
thesizer 1 1 6 of Fig. 3 can be modified to eliminate FFTs 
302 and mono signal generator 308. In that case, down- 
mixed (i.e., mono) spectral components 210 H) generat- 
ed by downmixer 208 and unmixed spectral compo- 



nents 204 LO and 206 LO are passed directly to the fre- 
quency-domain audio coder in the transmitter. Similarly, 
the corresponding downmixed (i.e., mono) and unmixed 
spectral components recovered by the frequency-do- 
5 main audio decoder in the receiver are passed directly 
to BCC stereo synthesizer 312 and I FFTs 318, respec- 
tively. 

[0052] The present invention may be implemented as 
circuit-based processes, including possible implemen- 

10 tation on a single integrated circuit. As would be appar- 
ent to one skilled in the art, various functions of circuit 
elements may also be implemented as processing steps 
in a software program. Such software may be employed 
in, for example, a digital signal processor, micro-control- 

is ler, or general-purpose computer. 

[0053] The present invention can be embodied in the 
form of methods and apparatuses for practicing those 
methods. The present invention can also be embodied 
in the form of program code embodied in tangible media, 

20 such as floppy diskettes, CD-ROMs, hard drives, or any 
other machine-readable storage medium, wherein, 
when the program code is loaded into and executed by 
a machine, such as a computer, the machine becomes 
an apparatus for practicing the invention. The present 

25 invention can also be embodied in the form of program 
code, for example, whether stored in a storage medium, 
loaded into and/or executed by a machine, or transmit- 
ted over some transmission medium or carrier, such as 
over electrical wiring or cabling, through fiber optics, or 

30 via electromagnetic radiation, wherein, when the pro- 
gram code is loaded into and executed by a machine, 
such as a computer, the machine becomes an appara- 
tus for practicing the invention. When implemented on 
a general-purpose processor, the program code seg- 

35 ments combine with the processor to provide a unique 
device that operates analogously to specific logic cir- 
cuits. 

[0054] It will be further understood that various chang- 
es in the details, materials, and arrangements of the 
40 parts which have been described and illustrated in order 
to explain the nature of this invention may be made by 
those skilled in the art without departing from the scope 
of the invention as expressed in the following claims. 



45 

Claims 

1 . A method for encoding N input audio signals, N>1, 
comprising the steps of: 

50 

(a) converting each of the N input audio signals 
into a plurality of spectral components in a fre- 
quency domain; 

(b) for each of one or more, but not all, of the 
55 spectral components, downmixing the spectral 

components corresponding to the A/input audio 
signals to generate a downmixed spectral com- 
ponent, leaving one or more of the spectral 
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components for each of the N input audio sig- 
nals unmixed; and 

(c) generating an encoded audio bitstream 
based on the one or more downmixed spectral 
components and one or more unmixed spectral 5 
components. 

2. A method of generating an encoded audio bit- 
stream, comprising the steps of: 

10 

(a) converting each of N input audio signals, 
AM into a plurality of spectral components in 
a frequency domain; 

(b) for each of one or more, but not all, of the 
spectral components, downmixing the spectral * 5 
components corresponding to the A/input audio 
signals to generate a downmixed spectral com- 
ponent, leaving one or more of the spectral 
components for each of the N input audio sig- 
nals unmixed; and 20 

(c) generating the encoded audio bitstream 
based on the one or more downmixed spectral 
components and one or more unmixed spectral 
components. 

25 

3. The method as claimed in claim 1 or 2, wherein step 
(c) comprises the steps of: 

(1) converting the one or more downmixed 
spectral components and the one or more un- 30 
mixed spectral components into N hybrid audio 
signals in a time domain; and 

(2) applying an audio coding algorithm to the N 
hybrid audio signals to generate the encoded 
audio bitstream. 35 

4. The method as claimed in any of claims 1 to 3, 
wherein step (b) further comprises the step of gen- 
erating one or more auditory spatial parameters for 
the one or more downmixed spectral components, 

5. The method as claimed in any of claims 1 to 4, 
wherein: N=2; the two input audio signals corre- 
spond to left and right input audio signals of a stereo 
input audio signal; each downmixed spectral com- 45 
ponent is a mono spectral component; and the en- 
coded audio bitstream is generated using a stereo 
audio coder. 

6. An apparatus for processing N input audio signals, s° 
AM for encoding, comprising: 

(a) one or more transforms configured to con- 
vert each of the N input audio signals into a plu- 
rality of spectral components in a frequency do- 55 
main; and 

(b) a downmixer configured, for each of one or 
more, but not all, of the spectral components, 



to downmix the spectral components corre- 
sponding to the N input audio signals to gener- 
ate a downmixed spectral component, leaving 
one or more of the spectral components for 
each of the N input audio signals unmixed. 

7. A method for decoding an encoded audio bitstream, 
comprising the steps of: 

(a) decoding the encoded audio bitstream to 
generate a plurality of spectral components in 
a frequency domain, wherein: 

one or more sets of the spectral compo- 
nents correspond to downmixed spectral 
components; and 

one or more sets of the spectral compo- 
nents correspond to unmixed spectral 
components; 

(b) for each set of the downmixed spectral com- 
ponents, applying one or more auditory spatial 
parameters to generate a synthesized spectral 
component; and 

(c) converting the synthesized spectral compo- 
nents and the unmixed spectral components in- 
to N decoded audio signals in a time domain, 
N>1. 

8. The method as claimed in claim 7, wherein step (a) 
comprises the steps of: 

(1 ) decoding the encoded audio bitstream to re- 
cover N hybrid audio signals; and 

(2) converting each of the N hybrid audio sig- 
nals into the plurality of spectral components in 
the frequency domain. 

9. The method as claimed in claim 8, wherein: N=2; 
the encoded audio bitstream is decoded using a 
stereo audio decoder; the two hybrid audio signals 
correspond to left and right hybrid audio signals of 
a hybrid stereo audio signal; and each downmixed 
spectral component is a mono spectral component. 

10. An apparatus for decoding an encoded audio bit- 
stream, comprising: 

(a) an audio decoder configured to decode the 
encoded audio bitstream to generate a plurality 
of spectral components in a frequency domain, 
wherein: 

one or more sets of the spectral compo- 
nents correspond to downmixed spectral 
components; and 

one or more sets of the spectral compo- 
nents correspond to unmixed spectral 
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components; 

(b) a synthesizer configured, for each set of the 
downmixed spectral components, to apply one 

or more auditory spatial parameters to gener- 5 
ate a synthesized spectral component; and 

(c) one or more inverse transforms configured 
to convert the synthesized spectral compo- 
nents and the unmixed spectral components in- 
to N decoded audio signals in a time domain, 10 
N>1. 
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